{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Series和DataFrame对象的创建\n",
    "pandas中的核心对象是Series和DataFrame，这一节主要介绍如何创建这两种对象。\n",
    "    \n",
    "__auther__ = 'zhenhang.sun@gmail.com'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-25T14:15:51.389871Z",
     "start_time": "2017-09-25T14:15:51.202730Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'D:\\\\新建文件夹\\\\pandas-tutorial'"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pwd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-25T14:16:03.547342Z",
     "start_time": "2017-09-25T14:15:51.394374Z"
    }
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "# 1. Series"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "Series是pandas中暴露给我们使用的基本对象，它是由相同元素类型构成的一维数据结构，同时具有列表和字典的属性，字典的属性由索引赋予。\n",
    "\n",
    "    Series：有序，有索引\n",
    "    list：  有序，无索引\n",
    "    dict：  无序，有索引"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.1 预览"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-25T14:16:04.076194Z",
     "start_time": "2017-09-25T14:16:03.551337Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    1\n",
       "b    2\n",
       "c    3\n",
       "Name: sss, dtype: int64"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = [1,2,3]\n",
    "index = ['a','b','c']\n",
    "s = pd.Series(data=data, index=index, name = 'sss')\n",
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-25T14:16:04.089203Z",
     "start_time": "2017-09-25T14:16:04.082197Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['a', 'b', 'c'], dtype='object')"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.index  # 四个属性之一：索引"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-25T14:16:04.323077Z",
     "start_time": "2017-09-25T14:16:04.091205Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'sss'"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.name  # 四个属性之二：名字，"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-25T14:16:04.639450Z",
     "start_time": "2017-09-25T14:16:04.327080Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1, 2, 3], dtype=int64)"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.values # 四个属性之三：值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-25T14:16:04.822963Z",
     "start_time": "2017-09-25T14:16:04.641434Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dtype('int64')"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.dtype # 四个属性之四：元素类型"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "----\n",
    "## 1.2 创建\n",
    "#### `pd.Series(data=None, index=None, name = None)`\n",
    "- data：多种类型，见下面具体介绍；\n",
    "- index：索引信息；\n",
    "- name：对data的说明，用的不多，一般在和DataFrame、Index互相转换时才需要。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.2.1 data无索引\n",
    "- 如果 data 为 **ndarray(1D) 或 list(1D)**，那么其缺少 Series 需要的索引信息；\n",
    "- 如果提供 index，则必须和data长度相同；\n",
    "- 如果不提供 index，那么其将生成默认数值索引 range(0, data.shape[0])。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-25T14:16:05.175335Z",
     "start_time": "2017-09-25T14:16:04.825968Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    1\n",
       "b    2\n",
       "c    3\n",
       "dtype: int32"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# data = [1,2,3]\n",
    "data1 = np.array([1,2,3])\n",
    "index1 = ['a','b','c']\n",
    "s = pd.Series(data = data1, index = index1)\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.2.2 data有索引\n",
    " - 如果 data 为 **Series 或 dict** ，那么其已经提供了 Series 需要的索引信息，所以 index 项是不需要提供的；\n",
    " - 如果额外提供了 index 项，那么其将对当前构建的Series进行 重索引（增删）（等同于reindex操作）。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-25T14:16:05.540456Z",
     "start_time": "2017-09-25T14:16:05.181344Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    1.0\n",
       "b    2.0\n",
       "d    NaN\n",
       "dtype: float64"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# data = pd.Series([a,b,c], index = ['a','b','c'] )\n",
    "data2 = { 'a':1, 'b':2,'c':3 }\n",
    "index2 = ['a','b','d']\n",
    "s = pd.Series(data = data2, index = index2)\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 如上，index项用于从当前已有索引中匹配出相同的行，如果当前索引缺失给定的索引，则填充NaN（NaN：not a number为pandas缺失值标记）。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "----\n",
    "# 2. DataFrame\n",
    "DataFrame由具有共同索引的Series按列排列构成（2D），是使用最多的对象。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.1 预览"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-25T14:16:06.079948Z",
     "start_time": "2017-09-25T14:16:05.549462Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   A  B  C\n",
       "a  1  2  3\n",
       "b  4  5  6"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = [[1,2,3],\n",
    "       [4,5,6]]\n",
    "index = ['a','b']\n",
    "columns = ['A','B','C']\n",
    "df = pd.DataFrame(data=data, index = index, columns = columns)\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-25T14:16:06.088969Z",
     "start_time": "2017-09-25T14:16:06.082952Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['a', 'b'], dtype='object')"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.index  # 行索引"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-25T14:16:06.299196Z",
     "start_time": "2017-09-25T14:16:06.091956Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['A', 'B', 'C'], dtype='object')"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.columns  # 列索引，由Series的name构成"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-25T14:16:06.454532Z",
     "start_time": "2017-09-25T14:16:06.308708Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1, 2, 3],\n",
       "       [4, 5, 6]], dtype=int64)"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.values "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-25T14:16:06.696251Z",
     "start_time": "2017-09-25T14:16:06.456534Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "A    int64\n",
       "B    int64\n",
       "C    int64\n",
       "dtype: object"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dtypes  # 这里的dtype带s，查看每列元素类型"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "----\n",
    "## 2.2 创建\n",
    "#### `pd.DataFrame(data=None, index=None, columns=None)`\n",
    "函数有多个参数，对我们有用的主要是：`data`,`index`和`columns`三项"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.1 data无 行索引，无 列索引\n",
    "- 如果 data 为 **ndarray(2D) or list(2D)**，那么其缺少 DataFrame 需要的行、列索引信息；\n",
    "- 如果提供 index 或 columns 项，其必须和data的行 或 列长度相同；\n",
    "- 如果不提供 index 或 columns 项，那么其将默认生成数值索引range(0, data.shape[0])) 或 range(0, data.shape[1])。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-25T14:16:07.141602Z",
     "start_time": "2017-09-25T14:16:06.702255Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   A  B  C\n",
       "a  1  2  3\n",
       "b  4  5  6"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# data = [[1,2,3],\n",
    "#        [4,5,6]]\n",
    "data1 = np.array([[1,2,3],\n",
    "                  [4,5,6]] )\n",
    "index1 = ['a','b']\n",
    "columns1 = ['A','B','C']\n",
    "df = pd.DataFrame(data=data1, index = index1, columns = columns1)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2 data无 行索引，有 列索引\n",
    " - 如果data为 **dict of (ndarray(1D) or list(1D))**，所有ndarray或list的长度必须相同。dict的key为DataFrame提供了需要的columns信息，缺失index；\n",
    " - 如果提供 index 项，必须和list的长度相同；\n",
    " - 如果不提供 index，那么其将默认生成数值索引range(0, data.shape[0]))；\n",
    " - 如果还额外提供了columns项，那么其将对当前构建的DataFrame进行 **列重索引**。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-25T14:16:07.349774Z",
     "start_time": "2017-09-25T14:16:07.146606Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   A  B    D\n",
       "a  1  2  NaN\n",
       "b  4  5  NaN"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data2 = { 'A' : [1,4], 'B': [2,5], 'C':[3,6] }\n",
    "index2 = ['a','b']\n",
    "columns2 = ['A','B','D']\n",
    "df = pd.DataFrame(data=data2, index = index2, columns = columns2)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.3 data有 行索引，有 列索引\n",
    " - 如果data为 **dict of (Series or dict)**，那么其已经提供了DataFrame需要的所有信息；\n",
    " - 如果多个Series或dict间的索引不一致，那么取并操作（pandas不会试图丢掉信息），缺失的数据填充NaN；\n",
    " - 如果提供了index项或columns项，那么其将对当前构建的DataFrame进行 重索引（reindex，pandas内部调用接口）。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-25T14:16:07.570012Z",
     "start_time": "2017-09-25T14:16:07.351777Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>4.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A    B    C\n",
       "a  1.0  2.0  3.0\n",
       "b  4.0  5.0  NaN\n",
       "c  NaN  NaN  6.0"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# data3 = { 'A' : pd.Series([1,4] ,index = ['a','b']), 'B' : pd.Series([2,5] ,index = ['a','b']), 'C' : pd.Series([3,6] ,index = ['a','c']) }\n",
    "data3 = { 'A' : { 'a':1, 'b':4}, 'B': {'a':2,'b':5}, 'C':{'a':3, 'c':6} }\n",
    "df = pd.DataFrame(data=data3)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "# 3. 由文件创建"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.1 由.csv文件创建"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### `pd.read_csv(filepath_or_buffer, sep=',', header='infer', names=None,index_col=None, encoding=None ) `\n",
    "read_csv的参数很多，但这几个参数就够我们使用了：\n",
    "- filepath_or_buffer：路径和文件名不要带中文，带中文容易报错。\n",
    "- sep: csv文件数据的分隔符，默认是','，根据实际情况修改；\n",
    "- header：如果有列名，那么这一项不用改；\n",
    "- names：如果没有列名，那么必须设置header = None， names为需要传入的列名列表，不设置默认生成数值索引；\n",
    "- index_col：list of (int or name)，传入列名的列表或者列名的位置，选取这几列作为索引；\n",
    "- encoding：根据你的文档编码来确定，如果有中文读取报错，试试encoding = 'gbk'。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-25T14:16:08.148424Z",
     "start_time": "2017-09-25T14:16:07.573011Z"
    },
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>total_bill</th>\n",
       "      <th>tip</th>\n",
       "      <th>sex</th>\n",
       "      <th>smoker</th>\n",
       "      <th>day</th>\n",
       "      <th>time</th>\n",
       "      <th>size</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>16.99</td>\n",
       "      <td>1.01</td>\n",
       "      <td>Female</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10.34</td>\n",
       "      <td>1.66</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>21.01</td>\n",
       "      <td>3.50</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>23.68</td>\n",
       "      <td>3.31</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>24.59</td>\n",
       "      <td>3.61</td>\n",
       "      <td>Female</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   total_bill   tip     sex smoker  day    time  size\n",
       "0       16.99  1.01  Female     No  Sun  Dinner     2\n",
       "1       10.34  1.66    Male     No  Sun  Dinner     3\n",
       "2       21.01  3.50    Male     No  Sun  Dinner     3\n",
       "3       23.68  3.31    Male     No  Sun  Dinner     2\n",
       "4       24.59  3.61  Female     No  Sun  Dinner     4"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tips = pd.read_csv( 'tips.csv')\n",
    "tips.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-25T14:16:08.291633Z",
     "start_time": "2017-09-25T14:16:08.151413Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RangeIndex(start=0, stop=244, step=1)"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tips.index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-25T14:16:08.517552Z",
     "start_time": "2017-09-25T14:16:08.302620Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'size'], dtype='object')"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tips.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-25T14:16:08.899490Z",
     "start_time": "2017-09-25T14:16:08.520554Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[16.99, 1.01, 'Female', ..., 'Sun', 'Dinner', 2],\n",
       "       [10.34, 1.66, 'Male', ..., 'Sun', 'Dinner', 3],\n",
       "       [21.01, 3.5, 'Male', ..., 'Sun', 'Dinner', 3],\n",
       "       ...,\n",
       "       [22.67, 2.0, 'Male', ..., 'Sat', 'Dinner', 2],\n",
       "       [17.82, 1.75, 'Male', ..., 'Sat', 'Dinner', 2],\n",
       "       [18.78, 3.0, 'Female', ..., 'Thur', 'Dinner', 2]], dtype=object)"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tips.values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "---\n",
    "## 3.2 由.excel文件创建"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### `pd.read_excel(io, sheetname=0, header=0, index_col=None, names=None) `\n",
    "read_excel的参数很多，但这几个参数就够我们使用了：\n",
    "- header：如果有列名，那么这一项不用改；\n",
    "- names：如果没有列名，那么必须设置header = None， names为列名的列表，不设置默认生成数值索引；\n",
    "- index_col：同上。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {
    "height": "282px",
    "width": "252px"
   },
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "485px",
    "left": "0px",
    "right": "1068px",
    "top": "66px",
    "width": "212px"
   },
   "toc_section_display": "block",
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
